We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360{\deg} as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from each other mutually. Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle. To extrapolate editing to GAN out-of-domain views, we devise another module that is trained in a self-supervised learning manner. This module maps novel-view images to the hidden space of StyleGAN that allows StyleGAN to generate stylized images on novel views. These two modules together produce guided images in 360{\deg}views to finetune a NeRF to make stylization effects, where a stable fine-tuning strategy is proposed to achieve this. Experiments show that NeRFEditor outperforms prior work on benchmark and real-world scenes with better editability, fidelity, and identity preservation.
translated by 谷歌翻译
关于图像协调的最新作品将问题作为像素图像翻译任务通过大型自动编码器解决。在处理高分辨率图像时,它们的性能不令人满意和缓慢的推理速度。在这项工作中,我们观察到调整基本图像过滤器的输入参数,例如亮度和对比度,足以使人类从复合材料的图像中产生逼真的图像。因此,我们将图像协调作为图像级回归问题,以了解人类用于任务的过滤器的参数。我们提出了一个用于图像协调的谐波框架。与基于黑框自动编码器的先前方法不同,Harmonizer包含用于过滤器参数预测的神经网络,以及用于图像协调的几个白色框过滤器(基于预测参数)。我们还引入了级联回归器和一个动态损失策略,以使和声使更稳定地学习过滤器论点。由于我们的网络仅输出图像级参数和我们使用的过滤器是有效的,因此谐波比现有方法更轻,更快。全面的实验表明,谐波可以超过现有方法,尤其是在高分辨率输入的情况下。最后,我们将谐波应用于视频和谐,以1080p分辨率在框架和56 fps上实现一致的结果。代码和型号可在以下网址提供:https://github.com/zhkkke/harmonizer。
translated by 谷歌翻译
我们提出并研究了一个名为“盲图分解”(BID)的新任务,该任务要求将叠加的图像分离为盲点环境中的构成基础图像,也就是说,涉及混合和混合机制的源成分都是未知的。例如,雨水可能由多个组成部分组成,例如雨条,雨滴,雪和阴霾。雨图像可以视为这些组件的任意组合,其中一些或全部。如何将叠加的图像(如多雨图像)分解为不同的源组件是迈向现实世界视觉系统的关键步骤。为了促进对这项新任务的研究,我们构建了多个基准数据集,包括跨多个领域的混合图像分解,实际筛查,以及关节阴影/反射/水印。此外,我们提出了一个简单而通用的盲图分解网络(Biden),以作为未来工作的强大基准。实验结果证明了我们的基准和拜登的有效性。
translated by 谷歌翻译
最近,机器学习(ML)电位的发展使得以量子力学(QM)模型的精度进行大规模和长期分子模拟成为可能。但是,对于高水平的QM方法,例如在元gga级和/或具有精确交换的密度函数理论(DFT),量子蒙特卡洛等,生成足够数量的用于训练的数据由于其高成本,计算挑战性。在这项工作中,我们证明了基于ML的DFT模型Deep Kohn-Sham(Deepks)可以在很大程度上缓解这个问题。 DeepKS采用计算高效的基于神经网络的功能模型来构建在廉价DFT模型上添加的校正项。在训练后,DeepKs提供了与高级QM方法相比,具有紧密匹配的能量和力,但是所需的训练数据的数量是比训练可靠的ML潜力所需的数量级要小。因此,DeepKs可以用作昂贵的QM型号和ML电位之间的桥梁:一个人可以生成相当数量的高准确性QM数据来训练DeepKs模型,然后使用DeepKs型号来标记大量的配置以标记训练ML潜力。该周期系统方案在DFT软件包算盘中实施,该计划是开源的,可以在各种应用程序中使用。
translated by 谷歌翻译
联合学习(FL)在许多分散的用户中训练全球模型,每个用户都有本地数据集。与传统的集中学习相比,FL不需要直接访问本地数据集,因此旨在减轻数据隐私问题。但是,由于推理攻击,包括成员推理,属性推理和数据反演,FL中的数据隐私泄漏仍然存在。在这项工作中,我们提出了一种新型的隐私推理攻击,创造的偏好分析攻击(PPA),它准确地介绍了本地用户的私人偏好,例如,最喜欢(不喜欢)来自客户的在线购物中的(不喜欢)项目和最常见的表达式从用户的自拍照中。通常,PPA可以在本地客户端(用户)的特征上介绍top-k(即,尤其是k = 1、2、3和k = 1)的偏好。我们的关键见解是,本地用户模型的梯度变化对给定类别的样本比例(尤其是大多数(少数)类别的样本比例具有明显的敏感性。通过观察用户模型对类的梯度敏感性,PPA可以介绍用户本地数据集中类的样本比例,从而公开用户对类的偏好。 FL的固有统计异质性进一步促进了PPA。我们使用四个数据集(MNIST,CIFAR10,RAF-DB和PRODUCTS-10K)广泛评估了PPA的有效性。我们的结果表明,PPA分别达到了MNIST和CIFAR10的90%和98%的TOP-1攻击精度。更重要的是,在实际的购物商业商业场景(即产品-10k)和社交网络(即RAF-DB)中,PPA在前一种情况下,PPA获得了78%的TOP-1攻击精度,以推断出最有序的物品(即作为商业竞争对手),在后一种情况下,有88%来推断受害者用户最常见的面部表情,例如恶心。
translated by 谷歌翻译
由于缺乏深度信息,单眼3D对象检测在自主驾驶中非常具有挑战性。本文提出了一种基于多尺度深度分层的单眼单目眼3D对象检测算法,它使用锚定方法检测每像素预测中的3D对象。在所提出的MDS-Net中,开发了一种新的基于深度的分层结构,以通过在对象的深度和图像尺寸之间建立数学模型来改善网络的深度预测能力。然后开发出新的角度损耗功能,以进一步提高角度预测的精度并提高训练的收敛速度。最终在后处理阶段最终应用优化的软,以调整候选盒的置信度。基蒂基准测试的实验表明,MDS-Net在3D检测中优于现有的单目3D检测方法,并在满足实时要求时进行3D检测和BEV检测任务。
translated by 谷歌翻译
在这项工作中,我们呈现了DCC(更深层兼容的压缩),用于实时无人机的辅助边缘辅助视频分析的一个启用技术,内置于现有编解码器之上。DCC解决了一个重要的技术问题,以将流动的视频从无人机压缩到边缘,而不会严格地在边缘执行的视频分析任务的准确性和及时性。DCC通过流式视频中的每一位对视频分析同样有价值,这是对视频分析的同样有价值,这在传统的分析透视技术编解码器技术上打开了新的压缩室。我们利用特定的无人机的上下文和中级提示,从物体检测中追求保留分析质量所需的自适应保真度。我们在一个展示车辆检测应用中有原型DCC,并验证了其代表方案的效率。DCC通过基线方法减少9.5倍,在最先进的检测精度上,19-683%的速度减少了9.5倍。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译